Long-term OCR services aim to provide high-quality output to their users at competitive costs. It is essential to upgrade the models because of the complex data loaded by the users. The service providers encourage the users who provide data where the OCR model fails by rewarding them based on data complexity, readability, and available budget. Hitherto, the OCR works include preparing the models on standard datasets without considering the end-users. We propose a strategy of consistently upgrading an existing Handwritten Hindi OCR model three times on the dataset of 15 users. We fix the budget of 4 users for each iteration. For the first iteration, the model directly trains on the dataset from the first four users. For the rest iteration, all remaining users write a page each, which service providers later analyze to select the 4 (new) best users based on the quality of predictions on the human-readable words. Selected users write 23 more pages for upgrading the model. We upgrade the model with Curriculum Learning (CL) on the data available in the current iteration and compare the subset from previous iterations. The upgraded model is tested on a held-out set of one page each from all 23 users. We provide insights into our investigations on the effect of CL, user selection, and especially the data from unseen writing styles. Our work can be used for long-term OCR services in crowd-sourcing scenarios for the service providers and end users.
translated by 谷歌翻译
Handwritten Text Recognition (HTR) is more interesting and challenging than printed text due to uneven variations in the handwriting style of the writers, content, and time. HTR becomes more challenging for the Indic languages because of (i) multiple characters combined to form conjuncts which increase the number of characters of respective languages, and (ii) near to 100 unique basic Unicode characters in each Indic script. Recently, many recognition methods based on the encoder-decoder framework have been proposed to handle such problems. They still face many challenges, such as image blur and incomplete characters due to varying writing styles and ink density. We argue that most encoder-decoder methods are based on local visual features without explicit global semantic information. In this work, we enhance the performance of Indic handwritten text recognizers using global semantic information. We use a semantic module in an encoder-decoder framework for extracting global semantic information to recognize the Indic handwritten texts. The semantic information is used in both the encoder for supervision and the decoder for initialization. The semantic information is predicted from the word embedding of a pre-trained language model. Extensive experiments demonstrate that the proposed framework achieves state-of-the-art results on handwritten texts of ten Indic languages.
translated by 谷歌翻译
基于地面真理和没有地面真理的各种性能措施来评估开发跟踪算法的质量。现有的流行措施 - 基于地面真理的平均中心位置错误(ATA)和平均跟踪精度(ATA)可能有时会产生混淆,以量化开发算法的质量,以便在某些复杂的环境下跟踪对象(例如,缩放或定向或缩放和面向对象)。在本文中,我们提出了基于地面真理信息的三种新的辅助性能措施,以评估在这种复杂环境下开发跟踪算法的质量。此外,通过组合两个现有测量码头和ATA和三种新的提出措施来开发一种性能措施,以便在这种复杂条件下更好地量化所开发的跟踪算法。一些示例和实验结果得出结论,该措施优于现有的现有措施,以量化在这种复杂环境下跟踪对象的一种开发算法。
translated by 谷歌翻译
表结构识别对于全面了解文档是必要的。由于布局的高度多样化,内容的变化和空细胞的存在,非结构化业务文档中的表格很难解析。由于使用视觉或语言环境或两者既是识别单个小区的挑战,问题是特别困难的。准确地检测表格单元(包括空单元)简化了结构提取,因此,它成为我们工作的主要重点。我们提出了一种新的基于对象检测的深层模型,可以捕获表中单元格的固有对齐,并进行微调以快速优化。尽管对细胞准确地检测,但识别致密表的结构仍可能具有挑战性,因为在存在多行/列跨越单元的存在下捕获远程行/列依赖性的困难。因此,我们还旨在通过推导新的直线图的基础制剂来改善结构识别。从语义角度来看,我们突出了桌子中空细胞的重要性。要考虑这些细胞,我们建议对流行的评估标准提升。最后,我们介绍了一个适度大小的评估数据集,其引人注目的风格灵感来自人类认知,以鼓励对问题的新方法进行启发。我们的框架在基准数据集中通过2.7%的平均F1分数提高了先前的最先进的性能。
translated by 谷歌翻译
晶体中砂岩的晶粒分割从其周围基质/水泥划分薄片是计算机辅助矿物识别和砂岩分类的主要步骤。砂岩的显微图像含有许多矿物颗粒及其周围的基质/水泥。相邻谷物和基质之间的区别通常是模糊的,使晶粒分割困难。文献中存在各种解决方案来处理这些问题;然而,他们对砂岩岩画的不同模式并不强大。在本文中,我们将谷物分割制定为像素 - 明智的两类(即谷物和背景)语义分割任务。我们开发一个基于深度学习的端到端培训框架,名为Deep语义粒度分割网络(DSGSN),数据驱动方法,提供通用解决方案。根据作者的知识,这是探索深度神经网络来解决谷物分割问题的第一个工作。对微观图像的广泛实验强调我们的方法比具有更多参数的各种分段架构获得更好的分割精度。
translated by 谷歌翻译
The open-radio access network (O-RAN) embraces cloudification and network function virtualization for base-band function processing by dis-aggregated radio units (RUs), distributed units (DUs), and centralized units (CUs). These enable the cloud-RAN vision in full, where multiple mobile network operators (MNOs) can install their proprietary or open RUs, but lease on-demand computational resources for DU-CU functions from commonly available open-clouds via open x-haul interfaces. In this paper, we propose and compare the performances of min-max fairness and Vickrey-Clarke-Groves (VCG) auction-based x-haul and DU-CU resource allocation mechanisms to create a multi-tenant O-RAN ecosystem that is sustainable for small, medium, and large MNOs. The min-max fair approach minimizes the maximum OPEX of RUs through cost-sharing proportional to their demands, whereas the VCG auction-based approach minimizes the total OPEX for all resources utilized while extracting truthful demands from RUs. We consider time-wavelength division multiplexed (TWDM) passive optical network (PON)-based x-haul interfaces where PON virtualization technique is used to flexibly provide optical connections among RUs and edge-clouds at macro-cell RU locations as well as open-clouds at the central office locations. Moreover, we design efficient heuristics that yield significantly better economic efficiency and network resource utilization than conventional greedy resource allocation algorithms and reinforcement learning-based algorithms.
translated by 谷歌翻译
Indian e-commerce industry has evolved over the last decade and is expected to grow over the next few years. The focus has now shifted to turnaround time (TAT) due to the emergence of many third-party logistics providers and higher customer expectations. The key consideration for delivery providers is to balance their overall operating costs while meeting the promised TAT to their customers. E-commerce delivery partners operate through a network of facilities whose strategic locations help to run the operations efficiently. In this work, we identify the locations of hubs throughout the country and their corresponding mapping with the distribution centers. The objective is to minimize the total network costs with TAT adherence. We use Genetic Algorithm and leverage business constraints to reduce the solution search space and hence the solution time. The results indicate an improvement of 9.73% in TAT compliance compared with the current scenario.
translated by 谷歌翻译
随着数字时代的出现,由于技术进步,每天的任务都是自动化的。但是,技术尚未为人们提供足够的工具和保障措施。随着互联网连接全球越来越多的设备,确保连接设备的问题以均匀的螺旋速率增长。数据盗窃,身份盗窃,欺诈交易,密码妥协和系统漏洞正在成为常规的日常新闻。最近的人工智能进步引起了网络攻击的激烈威胁。 AI几乎应用于不同科学和工程的每个领域。 AI的干预不仅可以使特定任务自动化,而且可以提高效率。因此,很明显,如此美味的传播对网络犯罪分子来说是非常开胃的。因此,传统的网络威胁和攻击现在是``智能威胁''。本文讨论了网络安全和网络威胁,以及传统和智能的防御方式,以防止网络攻击。最终,结束讨论,以潜在的潜在前景结束讨论AI网络安全。
translated by 谷歌翻译
本文提议使用修改的完全连接层转移初始化,以进行1900诊断。卷积神经网络(CNN)在图像分类中取得了显着的结果。但是,由于图像识别应用程序的复杂性,培训高性能模型是一个非常复杂且耗时的过程。另一方面,转移学习是一种相对较新的学习方法,已在许多领域使用,以减少计算来实现良好的性能。在这项研究中,Pytorch预训练的模型(VGG19 \ _bn和WideresNet -101)首次在MNIST数据集中应用于初始化,并具有修改的完全连接的层。先前在Imagenet中对使用的Pytorch预培训模型进行了培训。提出的模型在Kaggle笔记本电脑中得到了开发和验证,并且在网络培训过程中没有花费巨大的计算时间,达到了99.77%的出色精度。我们还将相同的方法应用于SIIM-FISABIO-RSNA COVID-19检测数据集,并达到80.01%的精度。相比之下,以前的方法在训练过程中需要大量的压缩时间才能达到高性能模型。代码可在以下链接上找到:github.com/dipuk0506/spinalnet
translated by 谷歌翻译
最近已证明,平均场控制(MFC)是可扩展的工具,可近似解决大规模的多代理增强学习(MARL)问题。但是,这些研究通常仅限于无约束的累积奖励最大化框架。在本文中,我们表明,即使在存在约束的情况下,也可以使用MFC方法近似MARL问题。具体来说,我们证明,一个$ n $ agent的约束MARL问题,以及每个尺寸的尺寸$ | \ Mathcal {x} | $和$ | \ Mathcal {u} | $的状态和操作空间,可以通过与错误相关的约束MFC问题近似,$ e \ triangleq \ Mathcal {o} \ left([\ sqrt {| \ Mathcal {| \ Mathcal {x} |} |}+\ sqrt {| ]/\ sqrt {n} \ right)$。在奖励,成本和状态过渡功能独立于人口的行动分布的特殊情况下,我们证明该错误可以将错误提高到$ e = \ nathcal {o}(\ sqrt {| | \ Mathcal {x x x } |}/\ sqrt {n})$。另外,我们提供了一种基于自然策略梯度的算法,并证明它可以在$ \ Mathcal {o}(e)$的错误中解决受约束的MARL问题,并具有$ \ MATHCAL {O}的样本复杂性(E^{ - e^{ - 6})$。
translated by 谷歌翻译